This paper addresses the problem of 3D human pose estimation from a singleimage. We follow a standard two-step pipeline by first detecting the 2Dposition of the $N$ body joints, and then using these observations to infer 3Dpose. For the first step, we use a recent CNN-based detector. For the secondstep, most existing approaches perform 2$N$-to-3$N$ regression of the Cartesianjoint coordinates. We show that more precise pose estimates can be obtained byrepresenting both the 2D and 3D human poses using $N\times N$ distancematrices, and formulating the problem as a 2D-to-3D distance matrix regression.For learning such a regressor we leverage on simple Neural Networkarchitectures, which by construction, enforce positivity and symmetry of thepredicted matrices. The approach has also the advantage to naturally handlemissing observations and allowing to hypothesize the position of non-observedjoints. Quantitative results on Humaneva and Human3.6M datasets demonstrateconsistent performance gains over state-of-the-art. Qualitative evaluation onthe images in-the-wild of the LSP dataset, using the regressor learned onHuman3.6M, reveals very promising generalization results.
展开▼
机译:本文解决了从单个图像进行3D人体姿势估计的问题。我们遵循标准的两步流水线,首先检测$ N $身体关节的2D位置,然后使用这些观察值推断3Dpose。第一步,我们使用最近的基于CNN的检测器。第二步,大多数现有方法对笛卡尔关节坐标进行2 $ N $至3 $ N $回归。我们展示了通过使用$ N \ x N $距离矩阵表示2D和3D人体姿势,并将问题表达为2D到3D距离矩阵回归,可以获得更精确的姿势估计。简单的神经网络体系结构,通过构造可增强所预测矩阵的正性和对称性。该方法还有一个优势,可以自然地处理遗漏的观察,并允许假设未观察到的关节的位置。 Humaneva和Human3.6M数据集上的定量结果表明,与最新技术相比,性能获得了一致的提高。使用在Human3.6M上获得的回归值,对LSP数据集的野生图像进行定性评估,得出了非常有希望的推广结果。
展开▼